Cost Sensitive Discretization of Numeric Attributes
نویسندگان
چکیده
Many algorithms in decision tree learning have not been designed to handle numerically-valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost-sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain an optimal multi-interval splitting for each numeric attribute. A transparent description of the method and steps involved in cost-sensitive discretization is given. We also provide an assessment of its performance against two other well-known methods, i.e. entropy-based discretization and pure error-based discretization on an authentic financial dataset. From an algorithmic perspective, we show that an important deficiency of error-based discretization methods can be solved by introducing costs. From the application perspective, we discovered that using a discretization method is recommended. Finally, we use ROC-curves to illustrate that under particular conditions cost-based discretization can be optimal.
منابع مشابه
Cost Sensitive Discretization of
Many algorithms in decision tree learning are not designed to handle numeric valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain an optimal...
متن کاملA Hellinger-based discretization method for numeric attributes in classification learning
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into accoun...
متن کاملChi2: feature selection and discretization of numeric attributes
Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant attributes. This paper describes Chi2, a simple and general algorithm that uses the 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data, and achieves feature selection via discretization. The empirical results demonstrate that Chi2 i...
متن کاملFeature Selection via Discretization
| Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant and/or redundant attributes. Chi2 is a simple and general algorithm that uses the 2 statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data. It achieves feature selection via dis-cretization. It can handle mixed attributes, work with mul...
متن کاملDiscretizing Continuous Attributes Using Information Theory
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute ...
متن کامل